Routledge 


Taylor & Francis Group 


g 
z 


Military Psychology 


MILITARY 
PSYCHOLOGY 


ISSN: 0899-5605 (Print) 1532-7876 (Online) Journal homepage: http://www.tandfonline.com/loi/hmlp20 


Using Response Latency Measures for a 
Biographical Inventory 


Lawrence J. Stricker & David L. Alderton 


To cite this article: Lawrence J. Stricker & David L. Alderton (1999) Using Response Latency 
Measures for a Biographical Inventory, Military Psychology, 11:2, 169-188, DOI: 10.1207/ 
$15327876mp1102 3 


To link to this article: https://doi.org/10.1207/s15327876mp1102 3 


sea Published online: 17 Nov 2009. 
NJ 
(g Submit your article to this journal @ 
lil Article views: 11 
N 
IQ] View related articles @ 


we Citing articles: 1 View citing articles @ 


Full Terms & Conditions of access and use can be found at 
http://www.tandfonline.com/action/journallnformation?journalCode=hmlp20 


MILITARY PSYCHOLOGY, //(2), 169-188 
Copyright © 1999, Lawrence Erlbaum Associates, Inc. 


Using Response Latency Measures 
for a Biographical Inventory 


Lawrence J. Stricker 


Educational Testing Service 
Princeton, New Jersey 


David L. Alderton 


Navy Personnel Research and Development Center 
San Diego, California 


This study assessed the usefulness of response latency data for biographical inventory 
items in enhancing the inventory’s validity. The Armed Services Applicant Profile 
(ASAP) was administered by computer to Navy recruits, and the regular score, la- 
tency-weighted scores, and measures of deviant latencies were obtained. The la- 
tency-weighted scores did not improve the ASAP’s validity in predicting 6-month re- 
tention, when used instead of or in addition to the regular score, and the deviant 
latency measures did not function as suppressor or moderator variables to increase the 
ASAP’s validity. However, subgroups of items with differing latencies varied sys- 
tematically in their internal-consistency reliability (with increased reliability for sub- 
groups with shorter latencies), and a small subgroup of items with moderate latencies 
was almost as valid as the regular score, suggesting that latency data may be useful in 
writing and selecting inventory items. 


; the American Psycl 


Recent theoretical and empirical work in personality and social psychology, cou- 
pled with the advent of computerized testing, raises the real possibility of improv- 
ing the validity of personality, interest, and biographical inventories by administer- 
ing them via computer and using information about latency of responding to the 
items to modify conventional scoring techniques. 

Response latencies on personality inventory items and personality-trait adjec- 
tives have been extensively studied since the 1970s. A key finding is that items 


Requests for reprints should be sent to Lawrence J. Stricker, Educational Testing Service, 17R, 
Princeton, NJ 08541. 


170  STRICKER AND ALDERTON 


with long latencies are unstable: The responses to these items tend to change on re- 
test. In itemetric studies, latencies and the proportion of changed responses (over a 
4-week interval) correlated .21 to .41 for Minnesota Multiphasic Personality In- 
ventory (Hathaway & McKinley, 1951) items (Dunn, Lushene, & O’Neil, 1972), 
latencies and changed responses (over a 1-week period) correlated .36 for Person- 
ality Research Form (PRF; Jackson, 1984) items (Holden, Fekken, & Jackson, 
1985), and latencies and changed responses (over a 1-month period) correlated .49 
for Basic Personality Inventory (BPI; Jackson, 1989) items (Holden & Fekken, 
1990).! In an itemetric study that used changed responses on immediate retesting, 
however, latencies and the Ambdex index (Goldberg, 1963), a measure of instabil- 
ity, correlated —.05 (ns) for PRF items (Rogers, 1973). In experiments on individ- 
ual differences, the PRF items that each participant changed on retest 
(immediately in one experiment, after a 1-week period in the other) were predicted 
significantly better than chance on the basis of which items had the longest laten- 
cies for the participant during the initial administration (Fekken & Jackson, 1988). 

Several otherwise divergent conceptualizations are alike in suggesting that long 
latencies for inventory items reflect difficulty in responding. Some of the concep- 
tualizations are based on item characteristics, and others on the interaction be- 
tween individual differences and item characteristics (Fekken & Jackson, 1988). 
The item characteristics conceptualizations argue that the difficulty comes about 
because the item is hard to understand—unreadable, ambiguous, and so forth (e.g., 
Dunnet al., 1972; Hanley, 1962). The conceptualizations concerned with the inter- 
action between individual differences and item characteristics contend that the dif- 
ficulty arises because (a) the person has trouble in applying the item to himself or 
herself—the item deals with matters that are unfamiliar or unknown to the person 
or the different response alternatives to the item appear equally descriptive of him 
or her (e.g., Kuncel, 1973; Markus, 1977; Rogers, 1974a, 1974b), (b) he or she is 
dissimulating in answering the item (e.g., Holden, Kroner, Fekken, & Popham, 
1992),? or (c) the person’s emotions are aroused by the item (e.g., Gilbert, 1967; 
Temple & Geisinger, 1990). 

The observed link between the response latency of personality items and the 
items’ instability implies that the items’ latencies are also associated with the 
items’ validity, given the relation between the reliability of a measure and its valid- 
ity. In particular, the lower retest reliability associated with the instability of items 
with long latencies should also be reflected in lower predictive validity. The find- 
ings in the two investigations that bear on this issue are inconsistent. In one study 


'The signs of the correlations in the Dunn, Lushene, and O'Neil (1972) and the Holden and Fekken 
(1990) studies have been reflected to be consistent with the reversal of the dependent variable in these 
investigations from the proportion of unchanged responses to the proportion of changed responses. 

2One conceptualization of dissimulation contends that lying results in short, not long, latencies (Hsu, 
Santelli, & Hsu, 1989). 


This 
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly. 


RESPONSE LATENCY MEASURES 171 


(Holden et al., 1985), latencies for PRF items correlated —.22 with a concurrent va- 
lidity criterion (a composite of self-ratings, self-reports on an adjective checklist, 
and preference ratings), but in a second investigation (Holden & Fekken, 1990), la- 
tencies for BPI items correlated —.1 1 (ns) with another concurrent validity criterion 
(ratings by clinicians). However, the findings of these studies may be affected by 
the dichotomous format of the items: Items with extreme endorsement proportions 
tend to be more stable and less valid (Goldberg, 1963). Furthermore, these results 
were based on concurrent validity studies, and the consequences of the items’ in- 
stability would be more pronounced in predictive validity studies. 

The purpose of this study was to determine whether the findings about the con- 
nection between item latencies and item instability can be used to improve an in- 
ventory’s validity. More specifically, the main goal was to assess whether 
weighting item scores on the basis of their latencies improves the predictive valid- 
ity of the inventory’s total score. A secondary aim was to assess whether measures 
that reflect the extent to which participants’ latencies are deviant function as sup- 
pressor or moderator variables to increase the validity of the inventory’s total 
score. The notion is that deviant latencies reflect an unusual pattern of responding 
to the inventory, stemming from idiosyncratic difficulties with certain items, poor 
test-taking attitudes, and other variables that attenuate validity. Hence, using mea- 
sures of deviant latencies to suppress this invalid variance or to exploit their inter- 
action with the inventory’s score should increase validity. 


METHOD 
Overview 


Items from the Armed Services Applicant Profile (ASAP; Trent, 1993), a biograph- 
ical inventory, were administered by computer to Navy recruits (all men), and the 
participants’ response choices and response latencies were recorded. The regular 
score for the ASAP and a latency-based measure were obtained: regular scores for 
subgroups of items with different latencies, for optimal weighting by standard mul- 
tiple-regression methods. Measures of deviant latencies were also obtained for use 
as suppressor/moderator variables. Data for the criterion, retention in the Navy for 
6 months, were subsequently secured. 


ASAP 


Background. The ASAP is designed to predict the adjustment of enlisted 
personnel to military service. The final version of the ASAP consists of two 
50-item alternate forms drawn from an initial pool of 170 heterogeneous items cho- 
sen for their potential relevance to adjustment. The items encompass six factors 


1 Assoc 


This article is intended solely for the personal use of the individual user and is not to be disseminated broadly. 


ted by the American Psyc 


172 STRICKER AND ALDERTON 


(nondelinquency, work orientation, work ethic, academic achievement, social ad- 
aptation, and athletic involvement). The items have three to five alternatives, and 
the alternatives are separately scored with weights of 1 to 3 that have been empiri- 
cally derived to predict retention at 21 months of service. The total score for a form 
is the sum of the item scores (see Trent, 1993). 

The inventory’s predictive validity against retention criteria has been exten- 
sively studied, using a cohort of applicants for active duty in all the armed services. 
Earlier 130-item forms of the ASAP correlated .18 to .20 with retention at 6 
months (T. Trent, personal communication, August 1986), and the present 50-item 
forms correlated .29 with retention to the end of enlistment—usually 48 months 
(Trent, 1993). The internal-consistency reliability was .77 for earlier 125-item 
forms (T. Trent, personal communication, December 1987) and .74 to .76 for the 
present 50-item forms (Trent, 1993). 


ltems. Asset of 120 items from the initial pool was available for this study. Of 
the other 50 items in the initial pool, 26 had been dropped previously because they 
concerned circumstances beyond the respondent’s control, they might involve eth- 
nic or social class bias, they were intrusive, or they asked about the type of 
high-school credential (Trent, 1993). The additional 24 items were eliminated for 
this study because they duplicated remaining items. 

Minor editorial changes were made in the 120 retained items to achieve a con- 
sistent format and to eliminate unnecessary instructions (e.g., “Pick the main 
one’’), and the items were arranged in random order. 

Because the current item weights for the ASAP are unavailable for some of the 
120 items and are based on retention for 21 months rather than the 6-month period 
used in this study, new item weights were obtained, using the same procedures and 
the same cohort data (VN = 13,172—26,857) employed in deriving the current 
weights (M. A. Quennette, personal communication, January 1989; Trent, 1993) 
but for a 6-month period. 

In brief, a modification of the “horizontal percent” method (Stead & Shartle, 
1940) for deriving empirical weights for biographical items was employed. The 
percentage of applicants retained was computed for each alternative for the 120 
items. The distribution of these percentages was trichotomized, and alternatives 
with percentages in the top third were given a weight of 3, alternatives in the mid- 
dle third a weight of 2, and those in the bottom third a weight of 1. An exception 
was made for one item with an alternative indicating that the respondent did not 
graduate from high school; this alternative was assigned a weight of 1, for policy 
reasons.3 


3Several of the current weights were assigned on rational grounds to improve content validity; it was 
not feasible to replicate that process for this study. 


= 


RESPONSE LATENCY MEASURES 173 


Computer administration. The paper-and-pencil version of the ASAP was 
adapted for computer administration via the same Hewlett Packard Integral Per- 
sonal Computer used in the Accelerated Computerized Adaptive Testing—Armed 
Services Vocational Aptitude Battery system (Rafacz & Hetter, 1997). The com- 
puter-adapted version of the ASAP was designed to be as similar as possible to the 
original one in all important respects. 

The computer keyboard was simplified, consisting of numerical keys for enter- 
ing the participant’s identification number; keys labeled A, B, C, D, and E for re- 
sponse choices; an Enter key; and a Help key. The participant chose a response and 
recorded it by pressing the Enter key. The response could be changed at will before 
the Enter key was pressed. After the Enter key was pressed, the next item was pre- 
sented, and the participant could not return to the previous item or earlier ones. The 
participant could seek assistance from the proctor by pressing the Help key and 
raising his hand. 

The pertinent instructions follow: 


Read each question and all of its possible answers carefully, then select the 
one answer that is best or most appropriate for you. ... You should work 
quickly but be as accurate as you can. Your answers to some of these ques- 
tions may be verified for accuracy and honesty. 


Participants were not informed that their latencies were being recorded. The 
following information was recorded for each item: 


1. The response choice. 

2. The number of times that the response was changed. 

3. The latency (in hundredths of a second) between the time that the item was 
presented and the Enter key was pressed. 

4. The number of times that the Help key was used. 


ASAP measures. The regular ASAP score (the sum of the regular item 
scores) and latency-based ASAP scores were secured. The latency-based scores 
employed standardized latencies, and two versions of each score were obtained. 
(Items for which the participant used the Help key were excluded in standardizing 
the latencies and in the latency-based scores; items for which the participant 
changed his responses were included in the standardization and in the scores be- 
cause of the prevalence and relevance of changed items.) 

One version of the scores, using double-standardized latencies to eliminate the 
main effects of individuals and items (e.g., Popham & Holden, 1990), reflected 
conceptualizations concerned with the interaction between individual differences 
and item characteristics. First, each item’s latencies were standardized within 
items to eliminate item differences associated with readability, ambiguity, and 


This article is intended solely for the personal use of the inc 


174 STRICKER AND ALDERTON 


other characteristics. For this purpose, the “interquartile deviation” was computed 
for each participant’s latency for an item: (Actual Latency — Sample Median)/ 
Sample Interquartile Range. Then, using these interquartile deviations, each par- 
ticipant’s item latencies were standardized within participants to eliminate indi- 
vidual differences associated with reading speed, reaction time, and similar 
characteristics. For this purpose, the “double-standardized interquartile deviation” 
was computed for each interquartile deviation (corresponding to the item latency) 
for a participant: (Interquartile Deviation — Participant’s Median)/Participant’s 
Interquartile Range. 

This double standardization is modeled after one used in previous research on 
response latencies on personality inventories (e.g., Popham & Holden, 1990) but 
differs in three respects: 


1. Item latencies are first standardized within items and then standardized 
within participants instead of vice versa. This same sequence, commonly used in 
Q-type analyses of persons (e.g., Cattell, 1952; Stephenson, 1936), ensures that the 
latency scores are purely ipsative and completely comparable from participant to 
participant. 

2. A nonparametric procedure (Tukey, 1977) was employed, instead of the 
conventional parametric procedure involving transformation to standard scores, to 
reduce the effects on the standardization of the extreme skewness in the latency 
data. 

3. Participants with very extreme latencies were dropped, and all less-extreme 
latencies for the remaining participants were retained unchanged instead of arbi- 
trarily changing all outlying latencies to within-range values. 


Note that this linear transformation does not distort the original character of the 
latency data—differences from participant to participant and from item to 
item—unlike what occurs when nonlinear transformations, such as logarithmic 
ones, are used (Pachella, 1974). 

The other version of the scores, employing single-standardized latencies to 
eliminate the main effects for individuals, reflected conceptualizations linked with 
item characteristics. Using the actual latencies, each participant’s latencies were 
standardized within participants to eliminate individual differences. The “sin- 
gle-standardized interquartile deviation” was computed for each latency for a par- 
ticipant: (Actual Latency — Participant’s Median)/Participant’s Interquartile 
Range. 

The latency-based ASAP measures consisted of “item subgroup scores”: the 
mean of the regular item scores for each of 10 subgroups of 12 items, the sub- 
groups varying in their latencies, and the items in the subgroups differing from 
participant to participant. For example, Subgroup | had the items with the largest 
interquartile deviations (the longest latencies) for each participant, and Subgroup 


This article is intended solely for the personal use of the inc 


RESPONSE LATENCY MEASURES 175 


10 had the items with the smallest interquartile deviations (the shortest latencies). 
(When an item was excluded for a participant because the Help key was used, his 
Subgroup 10 had the 11 items with the smallest interquartile deviations.) Ten 
groups of items were used to achieve adequate reliability while permitting an ex- 
amination of subsets of items with extreme latencies. The scores were intended to 
be combined by multiple regression methods that weight the scores for maximum 
validity in predicting the retention criterion. 


Deviant latency measures. Four measures of deviant latencies were also 
obtained (all excluded items for which the participant used the Help key): 


1. The product-moment correlation (transformed to Fisher’s z) between a par- 
ticipant’s actual latencies and the sample’s median actual latencies. This is an index 
of the correspondence between the participant’s and the sample’s latencies. 

2. The absolute difference between the median  single-standardized 
interquartile deviation for item latencies of a participant and the sample. (These are 
the interquartile deviations used to standardize item latencies within items.) This is 
an index of the difference between the participant’s and the sample’s average la- 
tencies. 

3. The participant’s interquartile range of single-standardized interquartile de- 
viations for items (the same interquartile deviations used in the previous index). 
This is an index of the variability of the participant’s latencies. 

4. The number of the participant’s double-standardized interquartile deviations 
of 3.5 or more. This is an index of outlying latencies (an interquartile deviation of 
3.5, 3.5 interquartile ranges from the median and equivalent to 4.7 standard devia- 
tions from the mean, defines a “far out” outlier, Tukey, 1977). 


Criterion 


The criterion was completion of 6 months (i.e., 180 days) of active service (or sepa- 
ration for “‘nonpejorative” reasons during that period: officer commission, breach 
of contract by the service, death, or early release), calculated from service entry 
date. This operational definition of retention is adapted from the one used in previ- 
ous ASAP research (Trent, 1993). 


Procedures 


The ASAP, followed by one or more experimental cognitive tests (Alderton, 
Wolfe, & Larson, 1997), was computer administered to groups of approximately 30 


This 
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly. 


176 — STRICKER AND ALDERTON 


participants. The questionnaire about test-taking attitudes and related matters was 
completed at the end of the session. The ASAP administration took approximately 
30 min, and the entire administration took about 2% hr. The testing room held a bat- 
tery of 34 personal computers. 


Sample 


The sample consisted of 1,090 Navy recruits at the Recruit Training Center in San 
Diego. All were men (women are trained elsewhere). All recruits in the available 
units were asked to volunteer to participate in the study, but recruits who were not 
reservists with limited active-duty obligations were given preference. (These re- 
servists were not part of the study population.) The recruits were instructed that the 
test results would not affect their subsequent assignments or become part of their 
official records. 

The ASAP was administered to a total of 1,493 participants. Forty-two partici- 
pants were eliminated because information was unavailable for most or all of their 
pertinent variables. An additional 136 participants were excluded because they were 
not part of the study population: (a) They were reservists with limited active-duty 
obligations, (b) they had prior military service (or information about this matter was 
missing), (c) they took the ASAP more than 15 days after service entry (or this infor- 
mation was missing), or (d) they had a dominant language other than English.4 The 
remaining 225 participants were eliminated because of poor test-taking attitudes or 
behavior to approximate the ASAP’s use in operational settings.5 


Analyses 


Internal-consistency reliability was estimated by coefficient alpha for the regular 
ASAP score and by the intraclass correlation (Shrout & Fleiss, 1979; Case 1 for 
mean ratings) for the item subgroup scores. 

A series of regression analyses of the two kinds of ASAP measures were carried 
out against the retention criterion. The comparative validity of the measures was 
appraised from the zero-order correlations of the regular ASAP score and the mul- 
tiple correlation of the item subgroup scores. 


‘Dominant language was assessed by the following question, which was computer administered im- 
mediately preceding the ASAP: “What language do you read and write best? (a) English, (b) Spanish, (c) 
Chinese, (d) Tagalog, (e) Some other language.” 

5This elimination was done in two stages for the 1,315 participants not already excluded. First, 122 
participants were dropped for either of two reasons: 


1. They used the Help key for more than one item (n = 7). 
2. They reported on the paper-and-pencil questionnaire that they tried “very little” in the testing ses- 
sion (either at the beginning, at the end, or overall), or this information was missing (n = 115). 


This 
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly. 


RESPONSE LATENCY MEASURES 177 


The incremental validity of the latency-based measures, when combined with 
the regular ASAP score, was assessed by hierarchical regression analyses (Cohen 
& Cohen, 1983): The zero-order correlation for the regular ASAP score was com- 
pared with the multiple correlation for the regular ASAP score plus the item sub- 
group scores. (The latter were treated as a set, and Subgroup 10 was excluded to 
avoid collinearity between the regular ASAP score and the subgroup scores.) 

The ability of the deviant latency measures to suppress or moderate the validity 
of the regular ASAP score and item subgroup scores was also assessed by hierar- 
chical regression analyses. A suppressor effect was evaluated by (a) a comparison 
of the zero-order or multiple correlation for the ASAP measure with the multiple 
correlation for the ASAP measure plus the deviant latency measure and (b) a com- 
parison of the corresponding zero-order correlation and standardized par- 
tial-regression weights for the deviant latency measure, if the first comparison 
revealed a significant difference between the two correlations. (When suppression 
exists, the regression weight for a variable falls outside the boundaries set by its 
zero-order correlation and zero; Cohen & Cohen, 1983.) These analyses were done 
separately for each ASAP measure. (In the analyses of item subgroup scores, the 
10 scores were treated as a set.) 

A moderator effect was evaluated by a comparison of the multiple correlation 
of the ASAP measure and the suppressor/moderator variable with the multiple cor- 
relation for the two variables plus their product term (the latter representing the in- 
teraction between the ASAP measure and the suppressor/moderator variable). In 
common with the suppressor analyses, these moderator analyses were done sepa- 
rately for each ASAP measure, and the item subgroup scores were treated as a set. 

Both statistical and practical significance were used to evaluate the results of 
the analysis. For statistical significance, the .05 level was used in all analyses. For 
practical significance, the conventional “small” effect size was used in most analy- 
ses: a product-moment correlation, intraclass correlation, or coefficient alpha of 
.10 (Cohen, 1988). However, in analysis of incremental validity, a smaller effect 
size was used: an increase in the multiple correlation (i.e., a semipartial correla- 
tion) of .0S because of the initially low level of the regular ASAP score’s validity 
(r = .18-.20 for 6-month retention; T. Trent, personal communication, August 


Second, of the remaining participants, 103 were eliminated for any of these reasons: 


1. They made more than five changes in their responses to the same item (n = 59). This corresponded 
to an interquartile deviation of 3.5 in the distribution for this variable—a “far out” outlier. 

2. They had a maximum double-standardized interquartile deviation of 10.98 or more (7 = 26). This 
corresponded to an interquartile deviation of 3.5 in the distribution of maximum double-standardized 
interquartile deviations—a “far out” outlier. 

3. They had a minimum actual latency of 2.21 sec or less. This was the latency by the fastest 0.5% of 
the sample to the item with the shortest latencies, a criterion for improbably short latencies associated 
with premature responding (Jensen, 1985; n = 19; 1 participant was also excluded because he had a dou- 
ble-standardized interquartile deviation of 10.98 or more). 


This article is 


178 —STRICKER AND ALDERTON 


1986). Similarly, in analyses of the reliability and validity of item subgroup scores, 
a difference of .05 was used because of the anticipated low level of reliability (.24) 
and validity (.11) of these 12-item measures (estimated by the Spearman—Brown 
formula) from data for 125- and 130-item forms of the ASAP (T. Trent, personal 
communications, August 1986, December 1987). 


RESULTS 
Retention Criterion 


The retention rate was 91.2%: 994 participants of the 1,090 participants completed 
6 months of active service (or separated for nonpejorative reasons),° comparable to 
the 91.3% retention rate for the same time period in previous ASAP research (T. 
Trent, personal communication, August 1986). Given the extreme split in the reten- 
tion criterion, the maximum product-moment correlation with it is 57 (McNemar, 
1962). 


Reliability of ASAP Measures 


The internal-consistency reliability of the regular ASAP score and the la- 
tency-based ASAP measures is reported in Table 1. The reliability of the item sub- 
group scores is also shown in Figures | and 2. The reliability was .80 for the regular 
ASAP score, comparable to the previously reported reliability of .77 for a 125-item 
form (T. Trent, personal communication, December 1987). 

The reliability was lower for the item subgroup scores: .14 to .31 for the dou- 
ble-standardized versions and .10 to .40 for the single-standardized versions. The 
trends for the two kinds of scores diverged markedly. For the double-standardized 
version, the reliability was noticeably lower for scores at both extremes (Subgroup 
1, re=.14; Subgroup 10, r= .23). For the single-standardized version, the reliabil- 
ity systematically increased from the score with the longest latencies (Subgroup 1, 
r= .10) to the score with the shortest latencies (Subgroup 10, ry = .40). 

Because the item subgroup scores were based on 12 items, a relevant compari- 
son is the estimated reliability (using the Spearman—Brown formula) of .29 for the 
regular ASAP score with the same number of items. None of the dou- 
ble-standardized scores had appreciably higher reliability, whereas the two ex- 
treme scores had appreciably lower reliability (Subgroup 1, r= .14; Subgroup 10, 
ry = .23). In contrast, the two single-standardized scores with the longest latencies 
had appreciably higher reliability than the .29 estimate (Subgroup 9, ry.= .37; Sub- 


6Two participants were separated for a nonpejorative reason: death. 


RESPONSE LATENCY MEASURES 1179 


group 10, ry=.40), and the three scores with the shortest latencies plus a score with 
moderate latencies had noticeably lower reliability than this estimate (Subgroup 1, 
ry = .10; Subgroup 2, r= .15; Subgroup 3, rx = .24; Subgroup 5, ry = .21). 


Intercorrelations of ASAP Measures 


The intercorrelations of the ASAP measures appear in Table 1. The regular ASAP 
score correlated substantially with both versions of the item subgroup scores: .50 to 
.61 with the double-standardized version and .43 to .67 with the sin- 
gle-standardized version (note that these are part-whole correlations). The item 
subgroup scores correlated moderately with each other: .17 to .32 for the dou- 
ble-standardized version and .13 to .43 for the single-standardized version. All of 
these correlations are significant, both statistically and practically. 


Comparative Validity of ASAP Measures 


The correlations of the ASAP measures with the retention criterion are shown in 
Table 1. Figures 3 and 4 portray the correlations of the ASAP item subgroup scores 
with the criterion. The regular ASAP score correlated .17 with the criterion, compa- 
rable to the .18 to .20 correlations with 6-month retention reported previously (T. 
Trent, personal communication, August 1986). The multiple correlations of the 
item subgroup scores were also similar: .18 for both versions. All of these correla- 
tions are significant, both statistically and practically. 

The correlations for the individual item subgroup scores were lower: .07 to .14 
for the double-standardized version and .07 to .12 for the single-standardized ver- 
sion. Again, the trends for the two versions diverged. For the double-standardized 
version, the trend was curvilinear: The correlation was appreciably higher for a 
middle score (Subgroup 6, r = .14) and noticeably lower for the scores at the ex- 
tremes (especially Subgroup 10, r = .07). No trend was apparent for the sin- 
gle-standardized version. 

A pertinent comparison for these findings about the item subgroup scores is the 
estimated validity of .10 (using the Spearman—Brown formula) of the regular 
ASAP score for 12 items. None of the double-standardized or single-standardized 
scores had appreciably different validity than this estimate. 


Incremental Validity of Latency-Based ASAP Measures 


Neither version of the item subgroup scores produced a statistically and practically 
significant increase in the multiple correlation with the criterion when combined 
with the regular ASAP score (.18 vs. .17 for both versions), indicating that the la- 
tency-based ASAP measure did not have incremental validity. 


or one of its 


personal use of the individual user and is not to be disseminated broadly. 


> 


This doc 
This article is intended solely for the 


and the Retention Criterion 


TABLE 1 
Intercorrelations and Reliability of Regular and Latency-Based ASAP Measures 


Measure 


22. Retention criterion 


Note. 


Regular ASAP 
Item Subgroup 
Score 1—DS 
Item Subgroup 
Score 2—DS 
Item Subgroup 
Score 3—DS 
Item Subgroup 
Score 4—DS 
Item Subgroup 
Score 5—DS 
Item Subgroup 
Score 6—DS 
Item Subgroup 
Score 7—DS 
Item Subgroup 
Score 8—DS 


. Item Subgroup 


Score 9—DS 


. Item Subgroup 


Score 10—DS 


. Item Subgroup 


Score 1—SS 


. Item Subgroup 


Score 2—SS 


. Item Subgroup 


Score 3—SS 


. Item Subgroup 


Score 4—SS 


. Item Subgroup 


Score 5—SS 


. Item Subgroup 


Score 6—SS 


. Item Subgroup 


Score 7—SS 


. Item Subgroup 


Score 8—SS 


. Item Subgroup 


Score 9—SS 


. Item Subgroup 


Score 10—SS 


M 


274.63 
2.32 


2.26 


2.31 


2.25 


2.26 


2.26 


2.28 


2.30 


2.34 


2.45 


0.91 


SD 


16.32 
0.22 


0.24 


0.23 


0.24 


0.23 


0.24 


0.23 


0.24 


0.24 


0.24 


0.21 


0.22 


0.23 


0.23 


0.23 


0.24 


0.24 


0.25 


0.26 


0.26 


0.28 


] 


(80) 


2 


51 
(14) 


3 


58 
19 


(27) 


26 


7 8 
59 60 
23 22 
26 32 
27 31 
25 24 
23 24 

(28) 27 
(26) 


N= 1090. ASAP = Armed Services Applicant Profile; DS = double standardized; SS = single 


standardized. Decimal points have been omitted for correlations and reliability coefficients. 
Correlations of .06 and .08 are significant at the .05 and .01 levels (two-tailed), respectively. Reliability 
coefficients appear in parentheses. Reliability was estimated by coefficient alpha for the regular ASAP 


ated by the Ame 


This article is intended solely for the personal use of th 


and is not to be disseminated broadly. 


e individual user 


(31) 


(26) 


(23) 


TABLE 1 (Continued) 


12 13 14 
43 50 53 
23 23 28 
22 28 26 
23 30 30 
25 23 25 
22 28 30 
24 29 29 
28 30 34 
25 30 35 
29 25 33 
28 30 32 
(10) 3 21 
dis) 15 
(24) 


15 


56 
28 


34 


37 


30 


33 


32 


33 


32 


3] 


32 


20 


16 17 18 19 


38 59 58 62 
31 29 29 32 


31 35 36 36 
35 37 33 38 
34 35 33 33 
37 33 37 35 
35 36 31 39 
36 35 35 39 
35 39 33 38 
32 31 33 32 
29 30 32 34 
20 17 13 14 
19 20 20 27 
21 25 25 24 
28 23 28 23 


(21) 30 25 28 


(25) 27 3] 
(26) 29 
(32) 


(37) 


This 
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly. 


182  STRICKER AND ALDERTON 


50 


-40 


Reliability 
a S) w 
° °o ° 


2 
re) 


1 2 3 4 5 6 7 8 9 10 


Item Subgroup Score 


FIGURE 1 Internal-consistency reliability of double-standardized item subgroup scores. 


-50 


.40 


w 
° 


Reliability 
No 
oO 


ay 
fo} 


.00 


1 2 3 4 5 6 7 8 9 10 
Item Subgroup Score 


FIGURE 2 Internal-consistency reliability of single-standardized item subgroup scores. 


Incremental Validity of Deviant Latency Measures 
as Suppressor/Moderator Variables 


None of the four deviant latency measures produced a statistically and practically 
significant increase in the multiple correlation with the criterion when combined 
with the regular score (.17 vs. .17 for each measure), and none of the four measures 
produced a significant increase in the multiple correlation when combined with ei- 


RESPONSE LATENCY MEASURES 183 


Validity 


1 2 3 4 5 6 7 8 S$ 10 


Item Subgroup Score 


FIGURE 3 Predictive validity of dowble-standardized item subgroup scores. 


al Associ 


-16 
-14 - 
-12 
.10 
.08 
-06 
.04 
.02 
.00 


Validity 


1 2 3 4 5 6 7 8 9 10 


Item Subgroup Score 


This 
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly. 


FIGURE 4 Predictive validity of single-standardized item subgroup scores. 


ther version of the item subgroup scores (.18—.20 vs. .18 for the double-standardized 
version and .18-.19 vs. .18 forthe single-standardized version), indicating that these 
deviant latency measures did not function as suppressor variables. 

None of the deviant latency measures produced a significant increase in the 
multiple correlation with the criterion when their product score was combined 
with the regular ASAP score and the deviant latency measure (.17 vs. .17 for each 
measure), and none of the deviant latency measures produced a significant in- 
crease in the multiple correlation when their product score was combined with ei- 


This 
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly. 


184 | STRICKER AND ALDERTON 


ther version of the item subgroup scores and the deviant latency measure (.20-.21 
vs. .18-.19 for the double-standardized version; .21—.24 vs. .18-.19 for the sin- 
gle-standardized version), indicating that the deviant latency measures did not 
function as moderator variables. 


DISCUSSION 


It is apparent from the results that the latency-based ASAP measures did not im- 
prove the biographical inventory’s predictive validity when used instead of or in 
addition to the conventional ASAP score. In addition, it is equally clear that the 
measures of deviant latencies did not function as suppressor or moderator variables 
to enhance the ASAP’s validity either. 

Nonetheless, some important positive findings did emerge. Consistent with the 
expectation based on previous results that items with long latencies are unstable, 
systematic trends in reliability occurred in the analysis of item subgroups with sin- 
gle-standardized latencies, with lower internal-consistency reliability for sub- 
groups of items with longer latencies. The findings were much less clear-cut in the 
reliability analysis of item subgroups with double-standardized latencies but sug- 
gested lower reliability for subgroups with either very long or very short latencies. 
This unanticipated possibility that items with unusually short latencies may also be 
unstable needs to be followed up. An obvious conjecture is that very short laten- 
cies indicate participants are paying minimal attention to the item content or, at 
worst, are responding more or less randomly. The sample was screened to elimi- 
nate participants with poor test-taking attitudes, including individuals making im- 
possibly fast responses, but this process excluded only those with extreme 
behavior. 

The trends in reliability in this analysis of single-standardized latencies support 
and extend itemetric studies that uncovered a link between long latencies and in- 
stability for personality items (Dunn et al., 1972; Holden & Fekken, 1990; Holden 
et al., 1985). Because the single-standardization procedure used eliminates only 
individual differences in average latencies, considerable commonality probably 
exists from participant to participant in the items that make up the item subgroup 
scores. These findings indicate that the earlier results about retest reliability also 
apply to internal-consistency reliability and suggest that the previous findings 
were not simply an artifact of the dichotomous character of the personality items 
(Goldberg, 1963). 

The failure of these clear-cut trends in reliability to be paralleled by similar 
trends in the validity of the subgroup scores or by increases in overall validity 
when the scores are combined may occur because of the generally low level of va- 
lidity involved. In this circumstance, substantial increases in validity with im- 
provements in the ASAP are difficult to uncover, even when reliability is 


1 Assoc 


This article is intended solely for the personal use of the individual user and is not to be disseminated broadly. 


ted by the American Psyc 


RESPONSE LATENCY MEASURES 185 


dramatically enhanced. Follow-up studies with more predictable criteria are 
clearly in order. 

These reliability findings suggest that latency data may be useful in selecting 
items for reliability and, indirectly, for validity (Fekken & Jackson, 1988; Holden 
& Fekken, 1990). Standard item analytic methods that choose items with high cor- 
relations with the total score or with the criterion can accomplish these purposes, 
too. However, latency data may be particularly useful when (a) the measure is het- 
erogeneous and, hence, item—total score correlations are of questionable value; (b) 
the criterion has limited validity; or (c) the criterion cannot become available until 
substantial time has elapsed. 

Another important finding concerns the expectation that items with long laten- 
cies are less valid. The findings in the analysis of double-standardized item sub- 
group scores suggested that items with very long latencies as well as those with 
very short latencies were less valid. Furthermore, this analysis identified a subset 
of items with moderate latencies (Subgroup 6) that were more valid than the other 
sets and almost as valid as the regular ASAP score. Indeed, the estimated validity 
of this subgroup score would be .24 (using the Spearman—Brown formula) if it had 
as many items as the regular ASAP score, which is appreciably larger than the lat- 
ter’s validity of .17. This result clearly needs to be replicated, but it offers the in- 
triguing prospect of improving the ASAP’s validity by using more of the same 
kind of items that are in this subgroup. Because the double-standardization proce- 
dure clustered items on the basis of their participant by item interactions, it is un- 
likely that appreciable commonality exists from participant to participant in the 
items that make up this subgroup. Consequently, it would probably be necessary to 
identify the appropriate items individually for each participant, using computer- 
ized adaptive testing. How accurately such items can be identified remains to be 
seen. 

One other outcome is noteworthy. The similar validity of the ASAP regular 
score and the latency-based measures indicates that the unorthodox methods used 
to devise the latter—the item subgroup scores that rely on comparable scores from 
different sets of heterogeneous items—-did not degrade the ASAP’s validity. This 
outcome implies that these unusual procedures were reasonable. 

All in all, the findings offer mixed support for the competing conceptualiza- 
tions reflected in the two kinds of latency-based measures: (a) individual differ- 
ences by item characteristics interaction, represented by the double-standardized 
measures; and (b) item characteristics, represented by the single-standardized 
measures. The most clear-cut confirmation stems from the reliability findings for 
the single-standardized item subgroup scores and supports the item characteristics 
conceptualization. 

It should be recognized in this connection that the empirical keying of the 
ASAP items hampered the ability of the single-standardization procedure to im- 
prove validity. Insofar as the participants’ item latencies are in the same order and 


This 
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly. 


186 = STRICKER AND ALDERTON 


the present sample is comparable to the one used in deriving the item scores, la- 
tency data will not improve validity because the items already have optimal 
weights for predicting the criterion. For instance, suppose that the items with the 
longest latencies for everyone in the sample were also the least valid. The weights 
for the items reflect the level of validity for the sample, and adjusting the weights 
in the same way for each participant (because all participants have the same laten- 
cies) will have no effect. The ASAP’s empirical keying does not affect the reliabil- 
ity results for the single-standardized procedure because the items’ weights are not 
optimal for reliability. The keying also does not affect either the validity or reli- 
ability results for the double-standardized procedure because this standardization 
makes the latencies and the resulting adjustments different for each participant. 

In summary, this initial effort at using response latency data to improve the va- 
lidity of a biographical inventory directly was unsuccessful, but there were strong 
indications that employing these data in developing an inventory may enhance va- 
lidity indirectly and thereby accomplish the same goal. It should also be borne in 
mind that closely related work has directly improved the validity of personality in- 
ventories by the use of response latency data. Several recent studies have found 
that latency scores for a scale (i-e., the mean latency for endorsed items on the scale 
and the mean latency for rejected items on the scale) frequently had incremental 
validity in predicting external criteria when combined with the regular scale score 
(Holden, Fekken, & Cotton, 1991; Mervielde, 1988; Popham & Holden, 1990; 
Siem, 1996). This particular approach requires items that have dichotomous re- 
sponse alternatives and homogeneous content (Popham & Holden, 1990) and, 
hence, is inapplicable to a heterogeneous biographical inventory with several, sep- 
arately scored alternatives, such as the ASAP, necessitating the use of other meth- 
ods for combining latency data with scale scores.” Nonetheless, the findings with 
personality inventories underscore the potential for latency data. Given the ease of 
collecting response latency information, its ability to improve the validity and util- 
ity of self-report inventories merits serious investigation. 


ACKNOWLEDGMENTS 


This article was prepared under the Navy Manpower, Personnel, and Training 
R & D Program of the Office of Chief of Naval Research under Contract 


?The critical distinction between endorsing or rejecting an item is inherent in items with dichotomous 
alternatives (e.g., “I like hiking: true, false”). A true or yes response indicates the presence (or absence) 
of the characteristics, and a false or no response indicates the opposite. However, this distinction is inap- 
plicable to items with multiple alternatives, especially when the alternatives are not on a continuum 
(e.g., “Which of these school subjects did you find hardest? English, Mathematics, Physical Educa- 
tion”). Choosing one alternative does not necessarily mean the opposite of choosing another alternative. 
Alternatives that are not on a continuum may reflect entirely different things, and alternatives on a con- 
tinuum may represent different degrees of the same thing. 


o 
° 


This doc 
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly. 


RESPONSE LATENCY MEASURES 187 


NO00014—89-K-0072. Thanks to John J. Pass for encouraging this research, 
Thomas Trent for assisting in all stages of the study, John H. Wolfe for facilitating 
data collection, Mary A. Quennette for providing information about the screening 
and scoring of ASAP items, Gerald E. Larson for supplying the questionnaire on 
test-taking attitudes, Rebecca Redard and Thomas Sheridan for programming the 
computer-administered version of the ASAP, Michael Alvarez and Mark Knapp 
for administering the ASAP, Mike Dove for furnishing retention data, Robert F. 
Boldt and Donald A. Rock for advising on psychometric and statistical issues, 
Lucient C. Chan and Annette Turner for doing the computer programming for the 
data analysis, John J. Ferris and Judith Pollack for supervising this programming, 
and Robert F. Boldt and Philip K. Oltman for reviewing a draft of this article. 


REFERENCES 


Alderton, D.L., Wolfe, J.H., & Larson, G. E. (1997). The ECAT battery. Military Psychology, 9, 5-37. 

Cattell, R. B. (1952). The three basic factor-analytic research designs—Their interrelations and deriva- 
tives. Psychological Bulletin, 49, 499-520. 

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Law- 
rence Erlbaum Associates, Inc. 

Cohen, J., & Cohen, P. (1983). Applied multiple regression/correlation analysis for the behavioral sci- 
ences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc. 

Dunn, T.G., Lushene, R. E., & O'Neil, H. F., Jr. (1972). Complete automation of the MMPI and a study 
of its response latencies. Journal of Consulting and Clinical Psychology, 39, 381-387. 

Fekken, G.C., & Jackson, D. N. (1988). Predicting consistent psychological test item responses: A com- 
parison of models. Personality and Individual Differences, 19, 873-882. 

Gilbert, A. R. (1967). Increased diagnostic value of the Taylor Manifest Anxiety Scale by use of re- 
sponse latency. Psychological Reports, 20, 63-67. 

Goldberg, L. R. (1963). A model of item ambiguity in personality assessment. Educational and Psycho- 
logical Measurement, 23, 467-492. 

Hanley, C. (1962). The “difficulty” of a personality inventory item. Educational and Psychological 
Measurement, 22, 577-584. 

Hathaway, S.R., & McKinley, J.C. (1951). Minnesota Multiphasic Personality Inventory manual. New 
York: Psychological Corporation. 

Holden, R. R., & Fekken, G. C. (1990). Structured psychopathological test item characteristics and va- 
lidity. Psychological Assessment: A Journal of Consulting and Clinical Psychology, 2, 35-40. 
Holden, R. R., Fekken, G. C., & Cotton, D. H. G. (1991). Assessing psychopathology using structured 
test-item response latencies. Psychological Assessment: A Journal of Consulting and Clinical Psy- 

chology, 3, 111-118. 

Holden, R. R., Fekken, G. C., & Jackson, D. N. (1985). Structured personality test item characteristics 
and validity. Journal of Research in Personality, 19, 386-394. 

Holden, R. R., Kroner, D. G., Fekken, G. C., & Popham, S. M. (1992). A model of personality test item 
response dissimulation. Journal of Personality and Social Psychology, 63, 272-279. 

Hsu, L. M., Santelli, J., & Hsu, J. R. (1989). Faking detection validity and incremental validity of re- 
sponse latencies to MMPI subtle and obvious items. Journal of Personality Assessment, 53, 
278-295. 

Jackson, D. N. (1984). Personality Research Form manual. Port Huron, MI: Research Psychologists 
Press. 


or one of its 


personal use of the individual user and is not to be disseminated broadly. 


> 


This doc 
This article is intended solely for the 


188 § STRICKER AND ALDERTON 


Jackson, D. N. (1989). Basic Personality Inventory manual. Port Huron, MI: Research Psychologists 
Press. 

Jensen, A. R. (1985). Methodological and statistical techniques for the chronometric study of mental 
abilities. In C. R. Reynolds & V. L. Wilson (Eds.), Methodological and statistical advances in the 
study of individual differences (pp. 51-116). New York: Plenum. 

Kuncel, R. B. (1973). Response processes and relative location of subject and item. Educational and 
Psychological Measurement, 33, 545-563. 

Markus, H. (1977). Self-schemata and processing information about the self. Journal of Personality and 
Social Psychology, 35, 63-78. 

McNemar, Q. (1962). Psychological statistics (3rd ed.). New York: Wiley. 

Mervielde, I. (1988). Cognitive processes and computerized personality assessment. European Journal 
of Personality, 2, 97-111. 

Pachella, R. G. (1974). The interpretation of reaction time in information-processing research. In B. H. 
Kantowitz (Ed.), Human information processing: Tutorials in performance and cognition (pp. 
41~82). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc. 

Popham, S. M., & Holden R. R. (1990). Assessing MMPI constructs through the measurement of re- 
sponse latencies. Journal of Personality Assessment, 54, 469-478. 

Rafacz, B., & Hetter, R. D. (1997). ACAP hardware selection, software development, and acceptance 
testing. In W. A. Sands, B. K. Waters, & J. R. McBride (Eds.), Computerized adaptive test- 
ing—From inquiry to operation (pp. 145-156). Washington, DC: American Psychological Associa- 
tion. 

Rogers, T. B. (1973). Toward a definition of the difficulty of a personality item. Psychological Reports, 
33, 159-166. 

Rogers, T. B. (1974a). An analysis of the stages underlying the process of responding to personality 
items. Acta Psychologica, 38, 205-214. 

Rogers, T. B. (1974b). An analysis of two central stages underlying responding to personality items: The 
self-referent decision and response selection. Journal of Research in Personality, 8, 128-138. 
Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psycho- 

logical Bulletin, 86, 420-428. 

Siem, F. M. (1996). The use of response latencies to enhance self-report personality measures. Military 
Psychology, 8, 15-27. 

Stead, W. H., & Shartle, C. L. (1940). Occupational counseling techniques. New York: American Book. 

Stephenson, W. (1936). The foundations of psychometry: Four factor systems. Psychometrika, 1, 
195-209. 

Temple, D. E., & Geisinger, K. F. (1990). Response latency to computer-administered inventory items 
as an indicator of emotional arousal. Journal of Personality Assessment, 54, 288-297. 

Trent, T. (1993). The Armed Services Applicant Profile (ASAP). In T. Trent & J. H. Laurence (Eds.), 
Adaptability screening for the Armed Forces (pp. 71-99). Washington, DC: Department of Defense, 
Office of Assistant Secretary of Defense (Force Management and Personnel). 

Tukey, J. W. (1977). Exploratory data analysis. Reading, MA: Addison-Wesley. 


